Shrinkage Estimation for SAGE Data using a Mixture Dirichlet Prior
نویسندگان
چکیده
Serial Analysis of Gene Expression (SAGE) is a technique for estimating the gene expression profile of a biological sample. Any efficient inference in SAGE must be based upon efficient estimates of these gene expression profiles, which consist of the estimated relative abundances for each mRNA species present in the sample. The data from SAGE experiments are counts for each observed mRNA species, and can be modeled using a multinomial distribution with two characteristics: skewness in the distribution of relative abundances and small sample size relative to the dimension. As a result of these characteristics, a given SAGE sample will fail to capture a large number of expressed mRNA species present in the tissue. Standard empirical estimates of the relative abundances effectively ignore these missing, unobserved species, and consequently tend to also overestimate the abundance of the scarce observed species comprising a vast majority of the total. In this chapter, we review a new Bayesian procedure that yields improved estimates for the missing and scarce species without trading off much efficiency for the abundant species. The key to the procedure is the mixture Dirichlet prior, which stochastically partitions the mRNA species into abundant and scarce strata, with each stratum modeled with its own multivariate prior, a scalar multiple of a symmetric Dirichlet. Simulation studies demonstrate that the resulting shrinkage estimators have efficiency advantages over the MLE for SAGE scenarios simulated.
منابع مشابه
Bayesian shrinkage estimation of the relative abundance of mRNA transcripts using SAGE.
Serial analysis of gene expression (SAGE) is a technology for quantifying gene expression in biological tissue that yields count data that can be modeled by a multinomial distribution with two characteristics: skewness in the relative frequencies and small sample size relative to the dimension. As a result of these characteristics, a given SAGE sample may fail to capture a large number of expre...
متن کاملClassic and Bayes Shrinkage Estimation in Rayleigh Distribution Using a Point Guess Based on Censored Data
Introduction In classical methods of statistics, the parameter of interest is estimated based on a random sample using natural estimators such as maximum likelihood or unbiased estimators (sample information). In practice, the researcher has a prior information about the parameter in the form of a point guess value. Information in the guess value is called as nonsample information. Thomp...
متن کاملPositive-Shrinkage and Pretest Estimation in Multiple Regression: A Monte Carlo Study with Applications
Consider a problem of predicting a response variable using a set of covariates in a linear regression model. If it is a priori known or suspected that a subset of the covariates do not significantly contribute to the overall fit of the model, a restricted model that excludes these covariates, may be sufficient. If, on the other hand, the subset provides useful information, shrinkage meth...
متن کاملPredictive performance of Dirichlet process shrinkage methods in linear regression
An obvious Bayesian nonparametric generalization of ridge regression assumes that coefficients are exchangeable, from a prior distribution of unknown form, which is given a Dirichlet process prior with a normal base measure. The purpose of this paper is to explore predictive performance of this generalization, which does not seem to have received any detailed attention, despite related applicat...
متن کاملDirichlet Process Mixtures of Beta Distributions, with Applications to Density and Intensity Estimation
We propose a class of Bayesian nonparametric mixture models with a Beta distribution providing the mixture kernel and a Dirichlet process prior assigned to the mixing distribution. Motivating applications include density estimation on bounded domains, and inference for non-homogeneous Poisson processes over time. We present the mixture model formulation, discuss prior specification, and develop...
متن کامل